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Abstract 

We address the problem of propositional logic-based abduction, i.e., the problem of 
searching for a best explanation for a given propositional observation according to a given 
propositional knowledge base. We give a general algorithm, based on the notion of projec- 
tion; then we study restrictions over the representations of the knowledge base and of the 
query, and find new polynomial classes of abduction problems. 

1. Introduction 

Abduction consists in searching for a plausible explanation for a given observation. For 
instance, if p \= q then p is a plausible explanation for the observation q. More generally, 
abduction is the process of searching for a set of facts (the explanation, here p) that, 
conjointly with a given knowledge base (here p — >■ q), imply a given query (q). This process 
is also constrained by a set of hypotheses among which the explanations have to be chosen, 
and by a preference criterion among them. 

The problem of abduction proved its practical interest in many domains. For instance, it 
has been used to formalize text interpretation (Hobbs et al., 1993), system (Coste-Marquis 
k Marquis, 1998; Stumptner k Wotawa, 2001) or medical diagnosis (Bylander et al., 1989, 
Section 6). It is also closely related to configuration problems (Amilhastre et al., 2002), 
to the ATMS/CMS (Reiter k de Kleer, 1987), to default reasoning (Selman k Levesque, 
1990) and even to induction (Goebel, 1997). 

We are interested here in the complexity of propositional logic-based abduction, i.e., we 
assume both the knowledge base and the query are represented by propositional formulas. 
Even in this framework, many different formalizations have been proposed in the literature, 
mainly differing about the definition of an hypothesis and that of a best explanation (Eiter 
k Gottlob, 1995). We assume here that the hypotheses are the conjunctions of literals 
formed upon a distinguished subset of the variables involved, and that a best explanation 
is one no proper subconjunction of which is an explanation (subset-minimality criterion). 

Our purpose is to exhibit new polynomial classes of abduction problems. We give a 
general algorithm for finding a best explanation in the framework defined above, indepen- 
dently from the syntactic form of the formulas representing the knowledge base and the 
query. Then we explore the syntactic forms that allow a polynomial running time for this 
algorithm. We find new polynomial classes of abduction problems, among which the one 
restricting the knowledge base to be given as a Horn DNF and the query as a positive CNF, 
and the one restricting the knowledge base to be given as an affine formula and the query as 
a disjunction of linear equations. Our algorithm also unifies several previous such results. 
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The note is organized as follows. We first recall the useful notions of propositional 
logic (Section 2), formalize the problem (Section 3) and briefly survey previous work about 
the complexity of abduction (Section 4). Then we give our algorithm (Section 5) and 
explore polynomial classes for it (Section 6). Finally, we discuss our results and perspectives 
(Section 7). For lack of space we cannot detail proofs, but a longer version of this work, 
containing detailed proofs and examples, is available (Zanuttini, 2003). 

2. Preliminaries 

We assume a countable number of propositional variables x\,X2 ■ ■ ■ and the standard con- 
nectives -i, A, V, ®, — >, -f-h A literal is either a variable xi {positive literal) or its negation 
->Xi (negative literal). A propositional formula is a well- formed formula built on a finite 
number of variables and on the connectives; Var((p) denotes the set of variables that occur 
in the propositional formula </>. A clause is a finite disjunction of literals, and a proposi- 
tional formula is in Conjunctive Normal Form ( CNF) if it is written as a finite conjunction 
of clauses. For instance, c\> = (x\ V -^2) A (->xi V X2 V -■^3) is in CNF. The dual notions 
of clause and CNF are the notions of term (finite conjunction of literals) and Disjunctive 
Normal Form (DNF) (finite disjunction of terms). 

An assignment to a set of variables V is a set of literals m that contains exactly one 
literal per variable in V, and a model of a propositional formula <f> is an assignment m 
to Var(4>) that satisfies 4> in the usual way, where m assigns 1 to Xi iff xi E m; we also 
write m as a tuple, e.g., 0010 for {—>x\, ^X2, x$, ~^x^\. We write m[i] for the value assigned 
to Xi by m, and M.{4>) for the set of all the models of a propositional formula r/>; c\> is 
said to be satisfiable if M.(4>) / 0. A formula 4> is said to imply a propositional formula </>' 
(written cj) |= <//) if M.(4>) C M.(<fi'). More generally, we identify sets of models with Boolean 
functions, and use the notations M (negation), A4 V M! (disjunction) and so on. 

The notion of projection is very important for the rest of the paper. For m an assignment 
to a set of variables V and A C V, write Select a{™,) for the set of literals in m that are 
formed upon A, e.g., Select{ Xl )3 , 2 }(0110) = 01. Projecting a set of assignments onto a subset 
A of its variables intuitively consists in replacing each assignment m with Select a (to); for 
sake of simplicity however, we define the projection of a set of models M to be built upon 
the same set of variables as M.. This yields the following definition. 

Definition 1 (projection) Let V = {x\, . . . , x n } be a set of variables, M a set of as- 
signments to V and A C V. The projection of M. onto A is the set of assignments to V 
A4|A = {m I 3m' E Ai, Select a (m 1 ) = SelectA(m)}- 

For instance, let M = {0001,0010,0111,1100,1101} be a set of assignments to V = 
{xi, X2, X3, X4}, and let A = {x\,X2}. Then it is easily seen that 

M\ A = {0000,0001,0010,0011} U {0100,0101,0110,0111} U {1100, 1101, 1110, 1111} 

since {Select A (m) \ m E M} = {00, 01, 11}. 

Remark that the projection of the set of models of a formula 4> onto a set of variables 
A is the set of models of the most general consequence of that is independent of all the 
variables not in A. Note also that the projection of M.{4>) onto A is the set of models of the 
formula obtained from <f> by forgetting its variables not occurring in A. For more details 
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about variable forgetting and independence we refer the reader to the work by Lang et 
al. (Lang et al., 2002). 

It is useful to note some straightforward properties of projection. Let M., M! denote 
two sets of assignments to the set of variables V, and let A C V. First, projection is 
distributive over disjunction, i.e., (M. V M')\a = M.\a V M'\a- Now it is distributive over 
conjunction when M does not depend on the variables M! depends on, i.e., when there exist 
A, A' C V, A n A' = with M\ A = M (M does not depend on V\A) and M'\ A > = M', 
(M. A M')\a = -M.\A A M'\a holds; note that this is not true in the general case. Note finally 
that in general (M)\a is not the same as M.\ A - 

3. Our Model of Abduction 

We now formalize our model; for sake of simplicity, we first define abduction problems and 
then the notions of hypothesis and explanation. 

Definition 2 (abduction problem) A triple LT = (E, a, A) is called an abduction prob- 
lem if S and a are satisfiable propositional formulas and A is a set of variables with 
Var(a),A C Far(E); E is called the knowledge base of LT, a its query and A its set 
of abducibles. 

Definition 3 (hypothesis, explanation) LetH = (E,a,A) be an abduction problem. An 
hypothesis for H is a set of literals formed upon A (seen as their conjunction), and an 
hypothesis E for LT is an explanation for LT if E A E is satisfiable and E A E \= a. If no 
proper subconjunction of E is an explanation for LT, E is called a best explanation for LT. 

Note that this framework does not allow one to specify that a variable must occur unnegated 
(resp. negated) in an explanation. We do not think this is a prohibiting restriction, since 
abducibles are intuitively meant to represent the variables whose values can be, e.g., modi- 
fied, observed or repaired, and then no matter their sign in an explanation. But we note that 
it is a restriction, and that a more general framework can be defined where the abducibles 
are literals and the hypotheses, conjunctions of abducibles (Marquis, 2000). 

We are interested in the computational complexity of computing a best explanation for 
a given abduction problem, or asserting there is none at all. Following the usual model, 
we establish complexities with respect to the size of the representations of E and a and to 
the number of abducibles; for hardness results, the following associated decision problem is 
usually considered: is there at least one explanation for LT? Obviously, if this latter problem 
is hard, then the function problem also is. 

4. Previous Work 

The main general complexity results about propositional logic-based abduction with subset- 
minimality preference were stated by Eiter and Gottlob (1995). The authors show that 
deciding whether a given abduction problem has a solution at all is a Ef-complete problem, 
even if A U Var(a) = Far(E) and E is in CNF. As stated as well by Selman and Levesque 
(1990), they also establish that this problem becomes "only" NP-complete when E is Horn, 
and even acyclic Horn. Note that when SAT and deduction are polynomial with E the 
problem is obviously in NP. 
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In fact, very few classes of abduction problems are known to be polynomial for the 
search for explanations. As far as we know, the only such classes are those defined by the 
following restrictions (once again we refer the reader to the references for definitions): 

• X is in 2CNF and a is in 2DNF (Marquis, 2000, Section 4.2) 

• £ is given as a monotone CNF and a as a clause (Marquis, 2000, Section 4.2) 

• X is given as a definite Horn CNF and a as a conjunction of positive literals (Selman 
& Levesque, 1990; Eiter & Gottlob, 1995) 

• X is given as an acyclic Horn CNF with pseudo-completion unit-refutable and a is a 
variable (Eshghi, 1993) 

• X has bounded induced kernel width and a is given as a literal (del Val, 2000) 

• X is represented by its set of characteristics models (with respect to a particular basis) 
and a is a variable (Khardon h Roth, 1996); note that a set of characteristic models 
is not a prepositional formula, but that the result is however similar to the other ones 

• S is represented by the set of its models, or, equivalently, by a DNF with every variable 
occurring in each term, and a is any prepositional formula. 

The first two classes are proved polynomial with a general method for solving abduction 
problems with the notion of prime implicants, the last one is obvious since all the information 
is explicitely given in the input, and the four others are exhibited with ad hoc algorithms. 

Let us also mention that Amilhastre et al. (2002) study most of the related problems in 
the more general framework of multivalued theories instead of propositional formulas, i.e., 
when the domain of the variables is not restricted to be {0, 1}. The authors mainly show, 
as far as this note is concerned, that deciding whether there exists an explanation is still a 
Xf-complete problem (Amilhastre et al., 2002, Table 1). 

Note that not all these results are stated in our exact framework in the papers cited 
above, but that they all still hold in it. Let us also mention that the problem of enumerating 
all the best explanations for a given abduction problem is of great interest; Eiter and Makino 
(2002) provide a discussion and some first results about it, mainly in the case when the 
knowledge base is Horn. 

5. A General Algorithm 

We now give the principle of our algorithm. Let us stress first that, as well as, e.g., Marquis' 
construction (Marquis, 2000, Section 4.2), its outline matches point by point the definition 
of a best explanation; our ideas and Marquis' are anyway rather close. 

We are first interested in the hypotheses in which every abducible x G A occurs (either 
negated or unnegated); let us call them full hypotheses. Note indeed that every explanation 
E for an abduction problem is a subconjunction of a full explanation F; indeed, since E is 
by definition such that £ AE is satisfiable and implies a, it suffices to let F be Select Aim) 
for a model m of X A E A a. Minimization of F will be discussed later on. 
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Proposition 1 Let II = (E,a,A) be an abduction problem, and F a full hypothesis of II. 
Then F is an explanation for H if and only if there exists an assignment m to Var(E) with 
F = Select A (m) and m G M(S) A (M(E A ~ol))\a- 

Proof Assume first F is an explanation for II. Then (i) there exists an assignment m to 
Var(E) with m |= £ A F, thus F = SelectA(m) and m G -M(£), and (ii) E A F |= a, i.e., 
SAFAais unsatisfiable, thus F {Select Aim) \ m G X(EAa)}, thus m (^(SAa))^, 
thus m G (M(S Aa))| i4 . Conversely, if m G A4(E) A (MfSAa))^ let F = Select ^(m). 
Then we have (i) since m G v\4(E), E A F is satisfiable, and (ii) since m ^ (v\4(£ A cf))^, 
there is no m' G A^(E A a) with Select a{™,') = -F 1 , thus E A F A a is unsatisfiable, thus 
E A F |= a. □ 

Thus we have characterized the full explanations for a given abduction problem. Now 
minimizing such an explanation F is not a problem, since the following greedy procedure, 
given by Selman and Levesque (1990) reduces F into a best explanation for II: 

For every literal leFdo 

If E A F\{t) \= a then F <- F\{£} endif; 
Endfor; 

Note that depending on the order in which the literals £ G F are considered the result may 
be different, but that in all cases it will be a best explanation for II. 

Finally, we can give our general algorithm for computing a best explanation for a given 
abduction problem II = (E,a, ^4); its correctness follows directly from Proposition 1: 

£' -5— a propositional formula with _M(£') = 7W(S) A (7W(E Aa))^; 

If E' is unsatisfiable then return "No explanation" ; 

Else 

m <— a model of £'; 
F <— Select A(m); 
minimize F; 
return F; 
Endif; 

6. Polynomial Classes 

We now explore the new polynomial classes of abduction problems that our algorithm allows 
to exhibit. Throughout the section, n denotes the number of variables in Var(T,). 

6.1 Affine Formulas 

A propositional formula is said to be affine (or in XOR-CNF) (Schaefer, 1978; Kavvadias h 
Sideri, 1998; Zanuttini, 2002) if it is written as a finite conjunction of linear equations over 
the two-element field, e.g., c\> = (x\ = 1) A (x\ ®X2 ®X4 = 0). As can be seen, equations 
play the same role in affine formulas as clauses do in CNFs; roughly, affine formulas represent 
conjunctions of parity or equivalence constraints. This class proves interesting for knowledge 
representation, since on one hand it is tractable for most of the common reasoning tasks, and 



5 



Bruno Zanuttini 



on the other hand the affine approximations of a knowledge base can be made very small and 
are efficiently learnable (Zanuttini, 2002). We show that projecting an affine formula onto 
a subset of its variables is quite easy too, enabling our algorithm to run in polynomial time. 
The proof of the following lemma is easily obtained with gaussian elimination (Curtis, 1984): 
triangulate <f> with the variables in A put rightmost, and then keep only those equations 
formed upon A; full details are given in the technical report version (Zanuttini, 2003). 

Lemma 1 Let <f> be an affine formula containing k equations, and A C Var ((/)). Then 
an affine formula ip with M.(tp) = {M{4>))\A o,nd containing at most k equations can be 
computed in time 0(k 2 \Var((p)\). 

Proposition 2 If £ is represented by an affine formula containing k equations and a by 
a disjunction of k' linear equations, and A is a subset of Var (E), then searching for a best 
explanation for II = (E,a,A) can be done in time 0((k + k')((k + l) 2 + \A\(k + k'))n). 

Sketch of proof It is easily seen that an affine formula (containing k' + k equations and 
n variables) for £ A a can be computed in time linear in the size of a; this formula can be 
projected onto A in time 0((k + k') 2 n), and we straightforwardly get a disjunction of at 
most k + k! linear equations for (M(E Aa))^. Then we can use distributivity of A over V 
for solving the satisfiability problem of the algorithm; recall that SAT can be solved in time 
0(k 2 n) for an affine formula of k equations over n variables by the elimination method of 
Gauss (Curtis, 1984). The remaining operations are straightforward. □ 

Note that variables, literals and clauses are special cases of disjunctions of linear equations. 
6.2 DNFs 

Though the class of DNF formulas has very good computational properties, abduction 
remains a hard problem for it as a whole, even with additional restrictions. Recall that the 
TAUTOLOGY problem is the one of deciding whether a given DNF formula represents the 
identically true function, and that this problem is coNP-complete. 

Proposition 3 Deciding whether there is at least one explanation for a given abduction 
problem (E,a,A) is NP-complete when £ is given in DNF, even if a is a variable and 
AU{a} = Var(E). 

Sketch of proof Membership in NP is obvious, since deduction with DNFs is polynomial; 
now it is easily seen that £ is tautological if and only if the abduction problem (£ V 
(x), x, Var(T,)) has no explanation, where a; is a variable not occuring in £ (see the DNF 
E V (x) as the implication E — > x); EV(i) is in DNF, and we get the result. □ 

However, when E is represented by a DNF projecting it onto A is easy; indeed, the prop- 
erties of projection show that it suffices to cancel its literals that are not formed upon A. 
Consequently, if 4> is such a DNF containing k terms, then a DNF tp with M(ip) = (M(<fi))\A 
and containing at most k terms can be computed in time 0(k\Var((p)\). 

Thus we can show that some subclasses of the class of all DNFs allow polynomial 
abduction. We state the first result quite generally, but note that its assumptions are 
satisfied by natural classes of DNFs: e.g., that of Horn DNFs, i.e., those DNFs with at 
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most one positive literal per term; similarly, that of Horn-renamable DNFs, i.e., those 
that can be turned into a Horn DNF by replacing some variables with their negation, and 
simplifying double negations, everywhere in the formula; 2DNFs, those DNFs with at most 
two literals per term. We omit the proof of the following proposition, since it is essentially 
the same as that of Proposition 2 (simply follow the execution of the algorithm). 

Proposition 4 Let V be a class of DNFs that is stable under removal of occurrences of 
literals and for which the TAUTOLOGY problem is polynomial. If E is restricted to belong 
to T>, a is a clause and A is a subset of Var(E), then searching for a best explanation for 
II = (E,a,A) can be done in polynomial time. 

Thus we can establish that abduction is tractable if (among others) E is in Horn-renamable 
DNF (including the Horn and reverse Horn cases) or in 2DNF, and a is a clause. 

Finally, let us point out that with a very similar proof we can obtain polynomiality 
for some problems obtained by strengthening the restriction of Proposition 4 over S, but 
weakening that over a. 

Proposition 5 If H is represented by a Horn (resp. reverse Horn) DNF of k terms and 
a by a positive (resp. negative) CNF of k! clauses, and A is a subset of Var(H), then 
searching for a best explanation for Yi = (T,,a,A) can be done in time 0((k + \A\)kk'n). 
The same holds if E is represented by a positive (resp. negative) DNF of k terms and a by 
a Horn (resp. reverse Horn) CNF of k' clauses. 

Once again note that variables, literals and terms are all special cases of (reverse) Horn 
CNFs, and that variables, positive (resp. negative) clauses and positive (resp. negative) 
terms are all special cases of positive (resp. negative) CNFs. 

7. Discussion and Perspectives 

The general algorithm presented in this note allows us to derive new polynomial restrictions 
of abduction problems; even if this is not discussed here, for lack of space, it also allows to 
unify some previously known such restrictions (such as E in 2CNF and a in 2DNF, or E 
in monotone CNF and a given as a clause). The following list summarizes the main new 
polynomial restrictions: 

• S given as an affine formula and a as a disjunction of linear equations (Proposition 2) 

• E in Horn-renamable DNF and a given as a clause (Proposition 4) 

• E in 2DNF and a given as a clause (Proposition 4) 

• E in Horn (reverse Horn) DNF and a in positive (negative) CNF (Proposition 5) 

• E in negative (positive) DNF and a in reverse Horn (Horn) CNF (Proposition 5). 

Moreover, even if there is no guarantee for efficiency in the general case the presentation of 
our algorithm does not depend on the syntactic form of E or a, and it uses only standard 
operations on Boolean functions (projection, conjunction, negation). 
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Another interesting feature of this algorithm is that before minimization it computes 
the explanations intentionnally . Consequently, all the full explanations can be enumerated 
with roughly the same delay as the models of the formula representing them (£'). However, 
of course, there is no guarantee that two of them would not be minimized into the same 
best explanation, which prevents from concluding that our algorithm can enumerate all the 
best explanations; trying to extend it into this direction would be an interesting problem. 
For more details about enumeration we refer the reader to Eiter and Makino's work (Eiter 
& Makino, 2002). 

As identified by Selman and Levesque (1990), central to the task is the notion of pro- 
jection over a set of variables, and our algorithm isolates this subtask. However, our notion 
of projection only concerns variables, and not literals, which prevents from imposing a sign 
to the literals the hypotheses are formed upon, contrariwise to more general formalizations 
proposed for abduction, as Marquis' (Marquis, 2000). Even if we think this is not a pro- 
hibiting restriction, it would be interesting to try to fix that weakness of our algorithm 
while preserving its polynomial classes. 

Another problem of interest is the behaviour of our algorithm when £ and a are not 
only prepositional formulas, but more generally multivalued theories, in which the domain 
of variables is not restricted to be {0, 1}: e.g., signed formulas (Beckert et al., 1999). This 
framework is used, for instance, for configuration problems by Amilhastre et al. (2002). It 
is easily seen that our algorithm is still correct in this framework; however, there is still left 
to study in which cases its running time is polynomial. 

Finally, problems of great interest are those of deciding the relevance or the necessity of 
an abducible (Eiter &: Gottlob, 1995). An abducible x is said to be relevant to an abduction 
problem n if there is at least one best explanation for n containing x or ->x, and necessary 
to n if all the best explanations for n contain x or ->x. It is easily seen that x is necessary 
for n = (£, a, A) if and only if n' = (£, a, A\{x}) has no explanation, hence showing that 
polynomial restrictions for the search for explanations are polynomial as well for deciding 
the necessity of an hypothesis as soon as they are stable under the substitution of j4\{#} 
for A, which is the case for all restrictions considered in this note. Contrastingly, we do 
not know of any such relation for relevance, and the study of this problem would also be of 
great interest. 
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